Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: add to_offset method to Timedelta #9064 #9226

Merged
merged 1 commit into from
Jan 18, 2015
Merged

ENH: add to_offset method to Timedelta #9064 #9226

merged 1 commit into from
Jan 18, 2015

Conversation

tvyomkesh
Copy link
Contributor

Closes #9064

@jorisvandenbossche
Copy link
Member

Another option is to ensure it is first a pd.Timedelta and then use total_seconds (this always has that, also on python 2.6).

Some other things:

  • can you update the docstring of to_offset
  • needs some more tests I think. Certainly test for both datetime.timedelta/pandas.Timedelta input
  • I was wondering if it would be possible that eg to_offset(pd.Timedelta('1 day')) would return a 1 Day offset instead of 86400 Seconds offset.

cc @bjonen @cancan101

@jorisvandenbossche jorisvandenbossche added the Frequency DateOffsets label Jan 13, 2015
@jorisvandenbossche jorisvandenbossche added this to the 0.16.0 milestone Jan 13, 2015
@jreback
Copy link
Contributor

jreback commented Jan 13, 2015

your method will not handle < 1 sec if its not divisible. better to start go thru the time hierarchy and add offsets. e.g. pd.Timedelta('1 day 1 sec') == Day(1) + Second(1), you can use the .components property to get these.

@tvyomkesh
Copy link
Contributor Author

@jorisvandenbossche working on these

  • add handling of pd.Timedelta objects.
  • update the docstring of to_offset.
  • Looking into keeping the resolution same e.g., Day(1) to Day(1) instead of Second(86400). I guess this would be taken care of if I could traverse the heirarchy and add each offset as it is (suggestion by @jreback)

@jreback my initial understanding was that converting everything into total_seconds would serve the purpose. However, going through the heirarchy and adding the offsets certainly sounds like a better way. Looking into this.

Thanks,

@tvyomkesh
Copy link
Contributor Author

@jorisvandenbossche I will add more tests.

@tvyomkesh
Copy link
Contributor Author

A question. @jreback when you say .components, are you referring to something similar to what I found in timedeltas.rst? -> "You can use the .components property to access a reduced form of the timedelta. This returns a DataFrame indexed similarly to the Series" or did you mean directly use the attributes days,hours,minutes,seconds,milliseconds,microseconds,nanoseconds? Thanks.

@jreback
Copy link
Contributor

jreback commented Jan 14, 2015

.components of a TimeDelta is a named tuple with the data u need

@jorisvandenbossche
Copy link
Member

you can better use the .components, as the others we are probably going to change.

@tvyomkesh
Copy link
Contributor Author

Ok thanks. I am seeing an inconsistency though. pd.Timedelta doesn't allow nanoseconds in constructor but its components list till nanoseconds. Here is what I can reproduce.

Python 2.7.9 (v2.7.9:648dcafa7e5f, Dec 10 2014, 10:10:46)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from pandas.tslib import Timedelta
>>> td = Timedelta(nanoseconds=1)
Traceback (most recent call last):
File "", line 1, in 
File "pandas/tslib.pyx", line 1723, in pandas.tslib.Timedelta.__new__ (pandas/tslib.c:29743)
raise ValueError("cannot construct a TimeDelta from the passed arguments, allowed keywords 
are " [days, seconds, microseconds, milliseconds, minutes, hours, weeks]
>>> td=Timedelta(seconds=1)
>>> td.components._fields
('days', 'hours', 'minutes', 'seconds', 'milliseconds', 'microseconds', 'nanoseconds')

@jreback
Copy link
Contributor

jreback commented Jan 14, 2015

hmm
nanoseconds is not allowed as an argument to datetime.timedelta but I t should be allowed here

u can add that if u want (and a separate test)

@tvyomkesh
Copy link
Contributor Author

Also, if I try to add Nano() DateOffset to any other pd.DateOffset on my machine, I get this error: TypeError: ufunc add cannot use operands with types dtype('O') and dtype('<m8[ns]'). Something I need to fix in my build etc.?

Example:

>>> from pandas import offsets
>>> d1=offsets.Second(1)
>>> d2=offsets.Milli(1)
>>> d3=offsets.Micro(1)
>>> d4=offsets.Nano(1)
>>> 5*d1+4*d2+3*d3+2*d4
Traceback (most recent call last):
  File "", line 1, in 
  File "pandas/tseries/offsets.py", line 2027, in __add__
    return _delta_to_tick(self.delta + other.delta)
TypeError: ufunc add cannot use operands with types dtype('O') and dtype('<m8[ns]')
>>> 5*d1+4*d2+3*d3
<5004003 * Micros>
>>> 5*d1+4*d2+0*d3
<5004 * Millis>
>>> 5*d1+4*d2+1*d3
<5004001 * Micros>

Thanks,

@tvyomkesh
Copy link
Contributor Author

Thanks @jreback I will club adding nanoseconds in constructor to this change.

@jreback
Copy link
Contributor

jreback commented Jan 14, 2015

hmm adding a Nano should work
maybe not converted properly (and clearly not tested)

@tvyomkesh
Copy link
Contributor Author

Ok thanks, I will need to look into these more and see if I spot a fix. I guess these need to be fixed first to properly traverse hierarchy and add offsets otherwise things seem to be working fine till microseconds.

@tvyomkesh
Copy link
Contributor Author

@jorisvandenbossche, @jreback I have updated the PR with the following changes:

  • to_offset can take either datetime.timedelta or pd.Timedelta as input
  • changed doc string of the function accordingly
  • fixed output time resolution to be the same as input
  • added more tests

PLMK if there are better ways to get DateOffset objects from td.components._fields. I have created a local name to offset object map in the method. Will attend to review comments as they come by. In the meantime, I will look into: a) why adding Nano to DateOffset is failing and b) adding support of Nano in pd.Timedelta constructor.

@@ -278,7 +278,8 @@ def get_period_alias(offset_str):

def to_offset(freqstr):
"""
Return DateOffset object from string representation
Return DateOffset object from string representation or
datetime.timedelta or pandas.tslib.Timedelta objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pandas.tslib.Timedelta -> pandas.Timedelta (that is how it is exposed to the users)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@tvyomkesh
Copy link
Contributor Author

From what I see, support for nano is broken in Timedelta and I am able to reproduce that in various ways. However, it seems like this ENH should should not need any further changes related to the nano fixes. Would it be a good idea to treat these two separately? I can go ahead and create a separate issue for the nano fix and continue the fix related discussions there, while this ENH can be merged independently.

@@ -298,6 +304,28 @@ def to_offset(freqstr):
name, stride = stride, name
name, _ = _base_and_stride(name)
delta = get_offset(name) * stride

elif isinstance(freqstr, tslib.Timedelta):
name_to_offset_map = {'days': Day(1),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put this as a module level variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good. done.

@jreback
Copy link
Contributor

jreback commented Jan 16, 2015

couple of minor changes, pls rebase and squash.

delta = None

if isinstance(freqstr, timedelta):
freqstr = tslib.Timedelta(freqstr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test is not needed, you can just do isinstance(freqstr, timedelta) below, and then first convert to Timedelta (pandas.Timedelta will also be an instance of timedelta)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. done

@tvyomkesh
Copy link
Contributor Author

Incorporated review comments and squashed. Created a new issue to track nano fix here. #9273. Thanks.

@jreback
Copy link
Contributor

jreback commented Jan 16, 2015

ok, looks good. Pls add a release note referencing the original issue. You can put it in the enhancements section.

@tvyomkesh
Copy link
Contributor Author

@jreback done adding note

@@ -99,6 +99,7 @@ Enhancements
- Added time interval selection in get_data_yahoo (:issue:`9071`)
- Added ``Series.str.slice_replace()``, which previously raised NotImplementedError (:issue:`8888`)
- Added ``Timestamp.to_datetime64()`` to complement ``Timedelta.to_timedelta64()`` (:issue:`9255`)
- ``pandas.tseries.frequencies.to_offset()`` now accepts ``pandas.Timdelta`` or ``datetime.timedelta`` as input (:issue:`9064`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can just say Timedelta (spelling too)

@jreback
Copy link
Contributor

jreback commented Jan 17, 2015

minor import correction, otherwise looks good. ping after you push and its green.

to_offset now accepts Timedelta as input
@tvyomkesh
Copy link
Contributor Author

Ok, changed imports, docstring and release note. Thanks.

jreback added a commit that referenced this pull request Jan 18, 2015
ENH: add to_offset method to Timedelta #9064
@jreback jreback merged commit 0523e7c into pandas-dev:master Jan 18, 2015
@jreback
Copy link
Contributor

jreback commented Jan 18, 2015

@tvyomkesh thanks!

@tvyomkesh tvyomkesh deleted the vyom/firstpr branch January 19, 2015 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Enhancement Frequency DateOffsets Timedelta Timedelta data type
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: add to_offset method to Timedelta
3 participants